Session E-1

E-1: Network Measurement

Conference
11:00 AM — 12:30 PM PDT
Local
May 21 Tue, 2:00 PM — 3:30 PM EDT
Location
Regency F

Robust or Risky: Measurement and Analysis of Domain Resolution Dependency

Shuhan Zhang (Tsinghua University, China); Shuai Wang (Zhongguancun Laboratory, China); Dan Li (Tsinghua University, China)

0
DNS relies on domain delegation for good scalability, where domains delegate their resolution service to authoritative nameservers. However, such delegations could lead to complex inter-dependencies between DNS zones. While the complex dependency might improve the robustness of domain resolution, it could also introduce security risks unexpectedly. In this work, we perform a large-scale measurement on nearly 217M domains to analyze their resolution dependencies at both zone level and infrastructure level. According to our analysis, domains under country-code TLDs and new generic TLDs present a more complex dependency relationship. For robustness consideration, popular domains prefer to configure more complex dependencies. However, centralized hosting of nameservers and the silent outsourcing of DNS providers could lead to the false redundancy at infrastructure level. Worse, considerable domain configurations in the wild are "not robust but risky": a complex dependency is also likely to bring vulnerabilities, e.g., domains with a 2 times higher dependency complexity have a 2.87 times larger proportion suffering from the hijacking risk via lame delegation.
Speaker Shuhan Zhang (Tsinghua University)



Accelerating Sketch-based End-Host Traffic Measurement with Automatic DPU Offloading

Xiang Chen, Xi Sun, Wenbin Zhang, Zizheng Wang, Xin Yao, Hongyan Liu and Gaoning Pan (Zhejiang University, China); Qun Huang (Peking University, China); Xuan Liu (Yangzhou University & Southeast University, China); Haifeng Zhou and Chunming Wu (Zhejiang University, China)

0
In modern networks, sketch-based traffic measurement offers a promising building block for monitoring real-time traffic statistics and detecting network events to guarantee quality of services for tenant applications. However, existing approaches of building sketches in end-hosts either suffer from poor packet processing performance or result in non-trivial CPU consumption in end-hosts. In this paper, we propose MPU, a framework that automatically offloads sketch-based measurement from end-host CPU cores to the emerging type of hardware, i.e., DPU. To achieve this goal, MPU offers two components. First, its sketch analyzer profiles the DPU resource consumption of heterogeneous sketches such as the minimum number of DPU cores required to achieve the maximum performance. Second, its optimization framework encodes DPU resource capacity and analysis results as constraints while formulating the problem of offloading sketches onto DPU. It optimally solves the problem to maximize sketch performance on DPU. We have implemented MPU on the NVIDIA BlueField DPU. Our testbed results indicate that MPU outperforms existing approaches with 85% lower per-packet processing latency while achieving 47% higher traffic measurement accuracy.
Speaker
Speaker biography is not available.

Effective Network-Wide Traffic Measurement: A Lightweight Distributed Sketch Deployment

Fuliang Li and Kejun Guo (Northeastern University, China); Jiaxing Shen (Lingnan University, Hong Kong); Xingwei Wang (Northeastern University, China)

0
Network measurement is critical for various network applications, but scaling measurement techniques to the network-wide level is challenging for existing sketch-based solutions. Centralized sketch deployment provides low resource usage but suffers from poor load balancing. In contrast, collaborative measurement achieves load balancing through traffic distribution across switches but requires high resource usage. This paper presents a novel lightweight distributed deployment framework that overcomes the limitations above. First, our framework is lightweight such that it splits sketches into segments and allocates them across forwarding paths to minimize resource usage and achieve load balance. This also enables per-packet load balancing by distributing computations across switches. Second, our framework is also optimized for load balancing by coordinating between flows and enabling finer-grained traffic division. We evaluate the proposed framework on various network topologies and different sketch deployments. Results indicate our solution matches the load balancing of collaborative measurement while approaching the low resource usage of centralized deployment. Moreover, it achieves superior performance in per-packet load balancing, which is not considered in previous deployment policies. Our work provides efficient distributed sketch deployment to strike a balance between load balance and resource usage enabling effective network-wide measurement.
Speaker Kejun Guo(Northeastern University, China)



QM-RGNN: An Efficient Online QoS Measurement Framework with Sparse Matrix Imputation for Distributed Edge Clouds

Heng Zhang, Zixuan Cui, Shaoyuan Huang, Deke Guo and Xiaofei Wang (Tianjin University, China); Wenyu Wang (Shanghai Zhuichu Networking Technologies Co., Ltd., China)

0
Measurements for the quality of end-to-end network services (QoS) are crucial to ensure stability, reliability, and user experience for distributed edge clouds. Existing QoS measurement uses sparse measured QoS data to estimate unmeasured QoS data. But they suffer from low estimation accuracy facing QoS data with high sparsity or significant volatility. Moreover, they also consume high computational costs of continuous measurements and lack of adaptivity. Our preliminary analysis reveals that end-to-end QoS is strongly temporal-spatial related. It inspires us to leverage partially measured QoS data to impute temporal-spatial-related unmeasured QoS data for reducing measurement costs. We propose a novel QoS measurement framework based on Residual-Graph-Neural-Network (QM-RGNN). It consists of three core components: 1) a learnable dynamic adaptive sample ratio to reduce the sampling costs. 2) a residual module is introduced in the encoder-decoder model to tackle highly sparse and volatile network QoS data; 3) an online learning pattern is designed to reduce continuous training costs. Experiments on two real-world edge cloud datasets demonstrate the superiority of QM-RGNN in QoS measurement. It obtains at least a 37.5% reduction of relative RMSE between ground-truth and predicted QoS data with up to 90% training cost reduction and 22.7% sampling cost reduction.
Speaker
Speaker biography is not available.

Session Chair

Deepak Nadig (Purdue University, USA)

Enter Zoom
Session E-2

E-2: Scheduling 1

Conference
2:00 PM — 3:30 PM PDT
Local
May 21 Tue, 5:00 PM — 6:30 PM EDT
Location
Regency F

Age-minimal CPU Scheduling

Mengqiu Zhou and Meng Zhang (Zhejiang University, China); Howard Yang (Zhejiang University, China & University of Illinois at Urbana Champaign (UIUC), USA); Roy Yates (Rutgers University, USA)

0
The proliferation of real-time status updating applications and ubiquitous mobile devices have motivated the analysis and optimization of data freshness in the context of age of information. At the same time, increasing requirements on computer performance have inspired research on CPU scheduling, with a focus on reducing energy consumption. However, since prior CPU scheduling strategies have ignored data freshness, we formulate the first CPU scheduling problem that aims to minimize the long-term average age of information, subject to an average power constraint. In particular, we optimize CPU scheduling strategies that specify when the CPU sleeps and adapt the CPU speed (clock frequency) during the execution of update-processing tasks. We consider the timely CPU scheduling problem as a constrained semi-Markov decision process problem with uncountable space. We develop a value-iteration-based algorithm and further prove its convergence in infinite space to obtain the optimal policy. Compared with existing benchmarks in terms of long-term average AoI, numerical results show that our proposed scheme can reduce the AoI by up to 53\%, and obtains greater benefits when faced with a tighter power constraint. In addition, for a given AoI target, the timely CPU scheduling policy can save more than 50\% on energy consumption.
Speaker
Speaker biography is not available.

Cur-CoEdge: Curiosity-Driven Collaborative Request Scheduling in Edge-Cloud Systems

Yunfeng Zhao and Chao Qiu (Tianjin University, China); Xiaoyun Shi (TianJin University, China); Xiaofei Wang (Tianjin Key Laboratory of Advanced Networking, Tianjin University, China); Dusit Niyato (Nanyang Technological University, Singapore); Victor C.M. Leung (Shenzhen University, China & The University of British Columbia, Canada)

0
The collaboration between clouds and edges unlocks the full potential of edge-cloud systems. Edge-cloud platform has brought about significant decentralization, heterogeneity, complexity, and instability. These characteristics have posed unprecedented challenges to the optimal scheduling problem in the edge-cloud system, including inaccurate decision-making and slow convergence. In this paper, we propose a curiosity-driven collaborative request scheduling scheme in edge-cloud systems, namely Cur-CoEdge. To tackle the challenge of inaccurate decision-making, we introduce a time-scale and decision-level interaction mechanism. This mechanism employs a small-large-time-scale scheduling learning framework, facilitating mutual learning between different decision levels. To address the challenge of slow convergence, we investigate the underlying reasons, such as the sparse reward-setting in reinforcement learning. In response, we develop a curiosity-driven collaborative exploration approach that fosters intrinsic curiosity in the cloud and simultaneously motivates dispatchers to explore the environment both individually and collectively. The effectiveness of this collaborative exploration is also supported by theoretical proof of convergence. Finally, we implement a prototype system on a network hardware system along with two real-world traces. Evaluations demonstrate significant improvements, with up to a 26% increase in time efficiency, a 40% rise in system throughput, and a 71% enhancement in convergence speed.
Speaker Yunfeng Zhao (Tianjin University)

Yunfeng Zhao is a PhD candidate at the College of Intelligence and Computing, Tianjin University, China. Her current research interests include edge computing, edge intelligence, and distributed machine learning.


InNetScheduler: In-network scheduling for time- and event-triggered critical traffic in TSN

Xiangwen Zhuge, Xinjun Cai, Xiaowu He, Zeyu Wang, Fan Dang, Wang Xu and Zheng Yang (Tsinghua University, China)

0
Time-Sensitive Networking (TSN) is an enabling technology for Industry 4.0. Traffic scheduling plays a key role for TSN to ensure low-latency and deterministic transmission of critical traffic. As industrial network scales, TSN networks are expected to support a rising number of both time-triggered and event-triggered critical traffic (TCT and ECT). In this work, we present InNetScheduler, the first in-network TSN scheduling paradigm that boosts the throughput, i.e., number of scheduled data flows, of both traffic types. Different from existing approaches that conduct entire scheduling on the server, InNetScheduler leverages the computation resources on switches to promptly schedule latency-critical ECT, and delegate the computational-intensive TCT scheduling to server. The key innovation of InNetScheduler includes a Load-Aware Optimizer to mitigate ECT conflicts, a Relaxated ECT Scheduler to accelerate in-network computation, and End-to-End Determinism Guarantee to lower scheduling jitter. We fully implement a suite of InNetScheduler-compatible TSN switches with hardware-software co-design. Extensive experiments are conducted on both simulation and physical testbeds, and the results demonstrate InNetScheduler's superior performance. By unleashing the power of in-network computation, InNetScheduler points out a direction to extend the capacity of existing industrial networks.
Speaker Xiangwen Zhuge (Tsinghua Univeristy)

Xiangwen Zhuge is currently a PhD student in Software Engineering at Tsinghua University, where he also completed my undergraduate studies. His research primarily focuses on time-sensitive networking(TSN). 


Learning-based Scheduling for Information Gathering with QoS Constraints

Qingsong Liu, Weihang Xu and Zhixuan Fang (Tsinghua University, China)

0
The problem of wireless scheduling over unreliable channels has attracted much attention due to its great practicability in the Internet of things systems. Most previous work focuses on the throughput/energy consumption/operational cost optimization or the setting that the channel information is known a priori. In this paper, we consider a more generic setting to this problem that packets from different sources have different values, and each heterogeneous source has a distinct Quality of Service (QoS) requirement. The information about packet value and channel reliability is unknown in advance, and the controller schedules sources over time to maximize its collected packet values while providing a QoS guarantee for each source. For the stationary case where packet values are independent and identically distributed (i.i.d.), we propose an efficient learning policy based on linear-programming (LP) methodology, and show that it provably meets the QoS constraint of each source and only incurs a logarithmic regret. In the special case of known channel reliability, our algorithm can further guarantee a bounded regret. Furthermore, in the case of non-stationary packet values, we apply sliding window technique to our LP-based algorithm and prove that it still guarantees a sublinear regret while meeting each source's QoS requirement.
Speaker
Speaker biography is not available.

Session Chair

Mohamed Hefeeda (Simon Fraser University, Canada)

Enter Zoom
Session E-3

E-3: Scheduling 2

Conference
4:00 PM — 5:30 PM PDT
Local
May 21 Tue, 7:00 PM — 8:30 PM EDT
Location
Regency F

Monitoring Correlated Sources: AoI-based Scheduling is Nearly Optimal

Rudrapatna Vallabh Ramakanth, Vishrant Tripathi and Eytan Modiano (MIT, USA)

2
We study the design of scheduling policies to minimize monitoring error for a collection of correlated sources when only one source can be observed at any given time. We model correlates sources as a discrete-time Wiener process, where the increments are multivariate normal random variables, with a general covariance matrix that captures the correlation structure between the sources. Under a Kalman filter-based optimal estimation framework, we show that the performance of all scheduling policies that are oblivious to instantaneous error can be lower and upper bounded by the weighted sum of Age of Information (AoI) across the sources for appropriately chosen weights. We use this insight to design scheduling policies that are only a constant factor away from optimality and make the rather surprising observation that AoI-based scheduling that ignores correlation is sufficient to obtain good performance guarantees. We also derive scaling results that show no order improvement in error performance due to correlation in our model, irrespective of the degree of correlation or scheduling policy chosen. Finally, we provide simulation results to verify our claims.
Speaker
Speaker biography is not available.

Scheduling Stochastic Traffic With End-to-End Deadlines in Multi-hop Wireless Networks

Christos Tsanikidis and Javad Ghaderi (Columbia University, USA)

0
Scheduling deadline-constrained packets in multihop networks has received increased attention recently. However, there is very limited work on this problem for wireless networks where links are subject to interference. The existing algorithms either provide approximation ratio guarantees which diminish in quality as parameters of the network scale, or hold in an asymptotic regime when the time horizon, network bandwidth, and packet arrival rates are scaled to infinity, which limits their practicality. While attaining a constant approximation ratio has been shown to be impossible in the worst-case traffic setting, it is unclear if the same holds under the stochastic traffic, in a non-asymptotic setting. In this work, we show that, in the stochastic traffic setting, constant approximation ratio or near-optimal algorithms can be achieved. Specifically, we propose algorithms that attain Ω((1 − ε)/β) or Ω(1 − ε) fraction of the optimal value, when the number of channels is C=Ω(log(L/ε)/ε^2) or C=Ω(χlog(L/ε)/ε^3) respectively, where L is the maximum route length of packets, χ is the fractional chromatic number of the network's interference graph, and β is its interference degree. This marks the first near-optimal results under nontrivial traffic and bandwidth assumptions in a non-asymptotic regime.
Speaker
Speaker biography is not available.

Train Once Apply Anywhere: Effective Scheduling for Network Function Chains Running on FUMES

Marcel Blöcher (SAP SE & TU Darmstadt, Germany); Nils Nedderhut (Vivenu & TU Darmstadt, Germany); Pavel Chuprikov (Università della Svizzera Italiana, Switzerland); Ramin Khalili (Huawei Technologies, Germany); Patrick Eugster (Università Della Svizzera Italiana (USI), Switzerland); Lin Wang (Paderborn University, Germany)

0
The emergence of network function virtualization has enabled network function chaining as a flexible approach for building complex network services. However, the high degree of flexibility envisioned for orchestrating network function chains introduces several challenges to support dynamism in workloads and the environment necessary for their realization. Existing works mostly consider supporting dynamism by re-adjusting provisioning of network function instances, incurring reaction times that are prohibitively high in practice. Existing solutions to dynamic packet scheduling rely on centralized schedulers and a priori knowledge of traffic characteristics, and cannot handle changes in the environment like link failures.
We fill this gap by presenting FUMES, a reinforcement learning based distributed agent design for the runtime scheduling problem of assigning packets undergoing treatment by network function chains to network function instances. Our system design consists of multiple distributed agents that cooperatively work on the scheduling problem. A key design choice enables agents, once trained, to be applicable for unknown chains and traffic patterns including branching, and different environments inlcuding link failures. The paper presents the system design and shows its suitability for realistic deployments. We empirically compare FUMES with state-of-the-art runtime scheduling solutions showing improved scheduling decisions at lower server capacity.
Speaker Marcel Blöcher (SAP & TU Darmstadt)

Marcel Blöcher is currently an architect at SAP working on resource scheduling of SAP’s own data centers. He received his Ph.D. from TU Darmstadt (Germany) in 2021. His research interests is on a broad range of resources scheduling problems. 


EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning

Yijun Hao, Shusen Yang, Fang Li, Yifan Zhang, Shibo Wang and Xuebin Ren (Xi'an Jiaotong University, China)

0
In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in practice.
We notice that the adaptive timescales would significantly improve the trade-off between the operation cost and delay performance. Based on this insight, we propose EdgeTimer, the first work to automatically generate adaptive timescales to update multi-layer scheduling decisions using deep reinforcement learning (DRL). First, EdgeTimer uses a three-layer hierarchical DRL framework to decouple the multi-layer decision-making task into a hierarchy of independent sub-tasks for improving learning efficiency. Second, to cope with each sub-task, EdgeTimer adopts a safe multi-agent DRL algorithm for decentralized scheduling while ensuring system reliability. We apply EdgeTimer to a wide range of Kubernetes scheduling rules, and evaluate it using production traces with different workload patterns. Extensive trace-driven experiments demonstrate that EdgeTimer can learn adaptive timescales, irrespective of workload patterns and built-in scheduling rules. It obtains up to 9.1x more profit than existing approaches without sacrificing the delay performance.
Speaker
Speaker biography is not available.

Session Chair

Alex Sprintson (Texas A&M University, USA)

Enter Zoom


Gold Sponsor


Gold Sponsor


Student Travel Grants


Student Travel Grants


Student Travel Grants

Made with in Toronto · Privacy Policy · INFOCOM 2020 · INFOCOM 2021 · INFOCOM 2022 · INFOCOM 2023 · © 2024 Duetone Corp.